Gammatone-domain model combination for consonant recognition in noisy environments
نویسندگان
چکیده
In this paper, a gammatone-domain model combination method is proposed for consonant recognition in noisy environments. For this task, we first define a gammatone cepstral coefficient (GCC) as the cepstral representation of the averaged envelopes of a gammatone filtered signal. Then, we investigate a proper phonetic unit by comparing monophone, diphone, and triphone acoustic models, where it is determined from consonant recognition experiments that the diphone hidden Markov models (HMMs) provide the best performance. Next, a gammatonedomain model combination method is developed to combine the clean and noise models in the linear gammatone-envelope domain. We then evaluate the performance of the GCC-based feature and the proposed model combination on intervocalic English consonants (VCV) with 24 different consonants. It is experimentally shown that the GCC-based feature achieves a relatively higher recognition rate of 47.46% than the mel-frequency cepstral coefficients (MFCCs). Also, the model combination applied to the GCC-based diphone HMM system relatively increases the accuracy rate by 77.67% under the noisy conditions.
منابع مشابه
Voice biometric feature using Gammatone filterbank and ICA
Voice biometric feature extraction is the core task in developing any speaker identification system. This paper proposes a robust feature extraction technique for the purpose of speaker identification. The technique is based on processing monaural speech signal using human auditory system based Gammatone Filterbank (GTF) and Independent Component Analysis (ICA). The measures used to assess the ...
متن کاملExtracting MFCC Features For Emotion Recognition From Audio Speech Signals
A major challenge for automatic speech recognition (ASR) relates to significant performance reduction in noisy environments. Recent research has shown that auditory features based on Gammatone filters are promising to improve robustness of ASR systems against noise, though the research is far from extensive and generalizability of the new features is unknown. This paper presents our implementat...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملImproved parallel model combination based on better domain transformation for speech recognition under noisy environments
The parallel model combination (PMC) technique has been shown to achieve very good performance for speech recognition under noisy conditions. However, there still exist some problems based on the PMC formula. In this paper, we first investigated these problems and some modifications on the transformation process of PMC were proposed. Experimental results show that this modified PMC can provide ...
متن کاملWhether Mfcc or Gfcc Is Better for Recognizing Emotion from Speech? a Study
A major challenge for automatic speech recognition (ASR) relates to significant performance reduction in noisy environments. Recently, the study of the emotional content of speech signals got more importance and hence, many systems have been proposed to identify the emotional content of a spoken utterance. The important aspects of the design of a speech emotion recognition system are pre-proces...
متن کامل